Implement Library README Discovery, Backfilling, and Publishing (#242)#380
Implement Library README Discovery, Backfilling, and Publishing (#242)#380SurbhiAgarwal1 wants to merge 35 commits into
Conversation
…pen-telemetry#242) - Extend InventoryManager to discover and load library READMEs from registry - Augment instrumentation metadata with markdown_hash and enable backfilling - Implement markdown publishing to public data directory in DatabaseWriter - Add frontend types and API support for README lazy loading
✅ Deploy Preview for otel-ecosystem-explorer ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Adds end-to-end support for per-library README markdown assets in the Java agent ecosystem: discovery from the registry, propagation/backfilling into instrumentation metadata, publishing to the explorer’s public data directory, and a frontend API hook to lazy-load the markdown.
Changes:
- Java instrumentation watcher: discover and load library README markdown assets from
library_readmes/. - Explorer DB builder: inject/backfill
markdown_hashinto library/custom items and publish markdown files topublic/data/javaagent/markdown/. - Frontend: extend Javaagent instrumentation types with
markdown_hashand add an API helper to fetch README markdown on demand.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| ecosystem-explorer/src/types/javaagent.ts | Adds markdown_hash to InstrumentationData for README association. |
| ecosystem-explorer/src/lib/api/javaagent-data.ts | Adds loadLibraryReadme() for lazy-loading published markdown assets. |
| ecosystem-automation/java-instrumentation-watcher/src/java_instrumentation_watcher/inventory_manager.py | Adds README map discovery + content loading + filename parsing for library_readmes/. |
| ecosystem-automation/java-instrumentation-watcher/src/java_instrumentation_watcher/init.py | Makes __version__ resilient when package metadata isn’t installed. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/metadata_backfiller.py | Backfills markdown_hash across versions. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/main.py | Publishes READMEs and augments inventories with markdown_hash before backfill/write. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/database_writer.py | Writes markdown assets to a content-addressed public directory with dedup. |
| ecosystem-automation/explorer-db-builder/src/explorer_db_builder/init.py | Makes __version__ resilient when package metadata isn’t installed. |
| ecosystem-automation/collector-watcher/src/collector_watcher/init.py | Makes __version__ resilient when package metadata isn’t installed. |
jaydeluca
left a comment
There was a problem hiding this comment.
looks great! tested things locally and everything looks good. Just a few suggestions from myself and copilot related to some of the logic
- Implement deterministic README selection in InventoryManager (mtime + lexicographical).\n- Tighten README filename regex to enforce 12-char hashes.\n- Use explicit None checks for README content publishing.\n- Update loadLibraryReadme to use fetchWithCache with status reporting.\n- Add unit and integration tests for README publishing and backfilling.
jaydeluca
left a comment
There was a problem hiding this comment.
looks good, can you just resolve the merge conflict?
…ublic/data from markdown lint
jaydeluca
left a comment
There was a problem hiding this comment.
@SurbhiAgarwal1 the changes look good. I pushed a change to remove the generated files from the PR because its best to have that added as part of an automation generated PR to keep the changes easier to review/go back to.
It looks like there might be some other conflicts with main / other files unrelated that are showing changes in the diff. Could you look into that? then this will be good to go, thanks
|
Hi @jaydeluca , I have stabilized the build for this PR and resolved the regressions in the frontend test suite. Is their any other changes needed. |
Implementation Detail: Library README Support (#242)
This document provides a technical breakdown of the changes implemented to support library README markdown files in the OpenTelemetry Ecosystem Explorer.
1. Registry Discovery & Extraction
Component:
java-instrumentation-watcherFile:
inventory_manager.pyThe
InventoryManagerwas enhanced to handle thelibrary_readmes/directory within the versioned registry.apache-httpclient-4.3-a3bae406cfcf.mdare parsed usingr"^(.*)-([a-f0-9]+)\.md$"to separate the library name from its content hash.load_library_readme_map(version)which caches this relationship, allowing the database builder to quickly correlate instrumentations to their READMEs.2. Metadata Augmentation & Backfilling
Component:
explorer-db-builderFiles:
main.py,metadata_backfiller.pyWe integrated README support into the existing content-addressed metadata system.
main.pyorchestrator augments bothlibrariesandcustominstrumentation metadata with themarkdown_hashduring the initial inventory load.markdown_hashtoBACKFILLABLE_FIELDS, the system ensures that older release versions (where the README might be missing in the registry) correctly inherit the link from newer versions if the library name matches.3. Database Writing & Publishing
Component:
explorer-db-builderFile:
database_writer.pyThe
DatabaseWritermanages the physical transfer of assets to the explorer's public storage./public/data/javaagent/markdown/{library-name}-{hash}.md.4. Frontend Integration
Component:
ecosystem-explorerFiles:
src/types/javaagent.ts,src/lib/api/javaagent-data.tsPrepared the React frontend for content rendering:
markdown_hash?: stringto theInstrumentationDatainterface.loadLibraryReadme(libraryName, markdownHash)to fetch markdown content from the public directory on demand.5. Build System & DX Improvements
Component: Multiple Watchers
Files:
__init__.pyModified the initialization logic to handle
PackageNotFoundError. This allows developers to run the builder and tests directly from source in uninstalled environments (common in dev/CI containers).6. Verification Summary
pytest ecosystem-automation/).ruff.v2.26.1correctly backfilled README links fromv2.27.0and that thepublic/datadirectory contains the correct assets.